Skip to content

Conversation

@pthmas
Copy link
Contributor

@pthmas pthmas commented Jan 16, 2026

Overview

  • Adds a height-based pruning mechanism to the ev-node store:
    • Prunes /h/{height}, /d/{height}, /c/{height}, /i/{hash} up to a target height.
    • Also prunes per-height DA metadata (height → DA height mapping) for pruned heights.
    • Leaves /s/{height} (state), /t (current height), and global metadata keys untouched.
    • Tracks progress via a last-pruned-block-height metadata key so pruning is monotonic and idempotent.
  • Integrates pruning into block production:
    • After a block is committed, at configurable intervals, the executor computes a prune target and calls store.PruneBlocks synchronously.
  • Wires pruning into the go-header store:
    • Sets the go-header pruning window to PruningKeepRecent * BlockTime so header/data retention matches ev-node’s window.
  • Adds pruning for EVM execution metadata:
    • EVMStore exposes PruneExecMeta to delete per-height ExecMeta entries up to a target height.
    • The EVM executor implements an optional ExecMetaPruner interface, and the block executor calls it from the same pruning hook.
    • Non-EVM (e.g. ABCI) executors don’t implement this interface, so they are unaffected.

Config

New node config fields / flags:

  • --evnode.node.pruning_enabled
  • --evnode.node.pruning_keep_recent
  • --evnode.node.pruning_interval

Pruning actually runs only when all three are set to non-trivial values (enabled, keep_recent > 0, interval > 0).

Design trade-offs

  • Runs in the critical path:
    Pruning is called synchronously in produceBlock. On pruning heights, block production will incur extra latency proportional to the amount of data pruned and underlying storage performance.

  • Best-effort pruning:
    If pruning fails, we log the error but do not fail the block. This avoids halting the node but means disk usage can grow silently if pruning keeps failing; operators will need to watch logs/metrics.

  • Scope:
    This PR prunes ev-node’s own store, the go-header store, and EVM ExecMeta. Pruning of ABCI state (ev-abci) is out of scope and will be added in a follow-up once ev-abci exposes its own pruning API.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 16, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedJan 26, 2026, 3:01 PM

@claude
Copy link
Contributor

claude bot commented Jan 16, 2026

Claude finished @pthmas's task in 2m 5s —— View job


Code Review: Block Pruning Implementation

Review Checklist

  • Read all changed files
  • Analyze code quality & architecture
  • Review error handling & resilience
  • Check security considerations
  • Assess performance & resource efficiency
  • Verify testing & reliability
  • Review documentation

Summary

This PR implements height-based pruning for ev-node's block data store. The implementation is generally well-designed with good idempotent tracking, comprehensive testing, and thoughtful integration across multiple stores. However, there are important concerns about running pruning synchronously in the critical path and some edge cases that need attention.

1. Code Quality & Architecture

✅ Strengths

  • Clean separation of concerns: Pruning is isolated in the Pruner interface (pkg/store/types.go:85-94), making it optional and composable
  • Consistent patterns: EVM execution metadata pruning mirrors the block store pruning design
  • Optional interface pattern: The ExecMetaPruner interface (core/execution/execution.go:165-176) allows graceful degradation for non-EVM executors
  • Good use of metadata keys: Progress tracking via LastPrunedBlockHeightKey and lastPrunedExecMetaKey enables idempotent operations

⚠️ Areas for Improvement

1. Critical Path Synchronous Execution

The pruning runs synchronously during block production (block/internal/executing/executor.go:451-473):

if e.config.Node.PruningEnabled && e.config.Node.PruningKeepRecent > 0 && e.config.Node.PruningInterval > 0 {
    if newHeight%e.config.Node.PruningInterval == 0 {
        // ... pruning code runs synchronously here
    }
}

Concern: Large pruning operations (deleting hundreds/thousands of heights) will block block production, potentially causing:

  • Missed block production windows
  • Timeouts in consensus
  • Poor UX during initial catch-up after enabling pruning

Recommendation: Consider one of:

  • Background goroutine: Move pruning to a background worker that runs asynchronously
  • Rate limiting: Prune a small batch per block (e.g., 10 heights per invocation) to bound latency
  • Separate pruning loop: Run pruning on a separate timer/trigger independent of block production

2. Missing Concurrency Safety

store.PruneBlocks() (pkg/store/store.go:283-368) creates a new batch and iterates through heights. There's no explicit locking to prevent concurrent pruning calls if future refactoring enables async pruning or multiple goroutines.

Recommendation: Add a mutex or atomic flag to prevent concurrent pruning operations, even if current code is sequential.

3. go-header Pruning Window Calculation

In pkg/sync/sync_service.go:261-267:

if syncService.conf.Node.PruningEnabled && syncService.conf.Node.PruningKeepRecent > 0 {
    pruningWindow := syncService.conf.Node.BlockTime.Duration * time.Duration(syncService.conf.Node.PruningKeepRecent)
    if pruningWindow > 0 {
        syncOpts = append(syncOpts, goheadersync.WithPruningWindow(pruningWindow))
    }
}

This assumes BlockTime is constant and accurate. If blocks are produced faster/slower than configured (lazy mode, network delays), the time-based window may not align with the height-based ev-node pruning.

Recommendation: Document this assumption or add validation to warn if actual block times diverge significantly from configured times.

2. Error Handling & Resilience

✅ Strengths

  • Best-effort pruning: Failures are logged but don't halt block production (executor.go:458, 468)
  • Idempotent design: PruneBlocks and PruneExecMeta check lastPruned and skip already-pruned ranges
  • Graceful handling of missing data: errors.Is(err, ds.ErrNotFound) checks allow pruning to continue even if some entries are already gone

⚠️ Potential Issues

1. Silent Failures Leading to Unbounded Disk Growth

The current design logs errors but continues:

if err := e.store.PruneBlocks(e.ctx, targetHeight); err != nil {
    e.logger.Error().Err(err).Uint64("target_height", targetHeight).Msg("failed to prune old block data")
}

Concern: If pruning consistently fails (e.g., disk full, permissions issue), operators won't notice until disk fills up. There's no metric, alert, or health check for pruning failures.

Recommendation:

  • Add a Prometheus metric for pruning failures
  • Add a health endpoint that exposes last successful prune time
  • Consider degrading node health status if pruning fails repeatedly

2. Partial Batch Commit Failures

PruneBlocks builds a large batch and commits once (store.go:363-365). If commit fails partway through, the lastPruned marker won't update, causing the next prune to retry the entire range.

Recommendation: This is acceptable for now, but document that large pruning ranges after enabling pruning may have retry overhead. Consider chunking batches in the future.

3. Context Cancellation Not Checked in Loop

The pruning loops in PruneBlocks (store.go:309-356) and PruneExecMeta (evm/store.go:173-180) don't check ctx.Done(). If the node is shutting down during a long pruning operation, it won't cancel gracefully.

Recommendation: Add select { case <-ctx.Done(): return ctx.Err(); default: } inside the pruning loops.

3. Security

✅ No Major Concerns

  • Pruning only affects historical data, not consensus-critical state
  • Proper input validation on config params (config.go:317-328)
  • No user input directly flows into pruning operations

⚠️ Minor Considerations

1. DoS via Aggressive Pruning Config

A misconfigured node with PruningInterval=1 and PruningKeepRecent=1000000 could theoretically cause performance degradation.

Recommendation: Add sanity checks or warnings:

if c.Node.PruningInterval == 1 && c.Node.PruningKeepRecent > 10000 {
    return fmt.Errorf("pruning on every block with large retention is not recommended")
}

2. No Access Control on Config Changes

This is an operational concern, not code: ensure only trusted operators can modify pruning configs to prevent accidental data loss.

4. Performance & Resource Efficiency

⚠️ Major Concerns

1. Synchronous Pruning Latency

As mentioned earlier, pruning runs in the critical path. Benchmarking is needed to understand:

  • How long does pruning 1000 heights take on different storage backends?
  • What's the worst-case latency impact on block production?

Recommendation: Add benchmark tests:

func BenchmarkPruneBlocks(b *testing.B) {
    // Seed 10000 blocks, measure pruning time for various ranges
}

2. Disk I/O Spikes

Deleting thousands of keys in a single batch can cause I/O spikes. The underlying datastore (LevelDB, BadgerDB, etc.) may compact or reorganize, causing unpredictable latency.

Recommendation: Monitor disk I/O metrics and consider rate-limiting pruning operations.

3. Memory Usage During Batch Building

The batch accumulates deletions for all heights in [lastPruned+1, height]. For large ranges, this could allocate significant memory.

Current Impact: Likely minimal for typical pruning intervals (e.g., pruning every 100 blocks with 1000-block retention). But needs monitoring.

5. Testing & Reliability

✅ Strengths

  • Comprehensive unit tests: store_test.go has excellent coverage (TestPruneBlocks_RemovesOldBlockDataOnly, TestPruneBlocks_TracksLastPrunedHeight)
  • EVM store tests: evm/store_test.go validates PruneExecMeta idempotency
  • Edge case coverage: Tests verify:
    • Idempotent pruning (same height twice)
    • Missing entries (already pruned data)
    • Invalid metadata (graceful degradation)

⚠️ Gaps

1. No Integration Tests

There are no integration tests that:

  • Verify pruning actually runs during block production
  • Check that pruned data is inaccessible
  • Validate that go-header store pruning aligns with ev-node pruning

Recommendation: Add an E2E test that:

  1. Produces 100 blocks with PruningInterval=10, PruningKeepRecent=50
  2. Verifies blocks 1-49 are pruned after block 100
  3. Confirms blocks 50-100 remain accessible

2. No Performance/Load Tests

Missing tests for:

  • Pruning performance under load
  • Behavior when pruning large backlogs (e.g., enabling pruning on an archive node)

Recommendation: Add a stress test that simulates enabling pruning on a node with 100k blocks.

3. No Tests for Failure Recovery

What happens if:

  • Pruning fails halfway through a batch commit?
  • The node crashes during pruning?

Recommendation: Add tests with mocked batch.Commit() failures to verify recovery behavior.

6. Documentation

✅ Strengths

  • Clear config comments: config.go:239-244 explains each pruning parameter
  • Inline comments: Good explanations in store.go:277-282 and executor.go:447-450
  • PR description: Excellent overview of design trade-offs

⚠️ Improvements Needed

1. Missing Operational Docs

There's no documentation on:

  • How to enable pruning on an existing node
  • Performance impact expectations
  • How to monitor pruning health
  • What to do if pruning fails

Recommendation: Add a docs/pruning.md file covering operational aspects.

2. Unclear Interaction Between Stores

It's not immediately obvious how the three pruning mechanisms (ev-node store, go-header store, EVM store) interact or if they can get out of sync.

Recommendation: Add a comment in executor.go explaining the coordination:

// Pruning coordination:
// 1. ev-node store prunes /h/, /d/, /c/, /i/ up to targetHeight
// 2. go-header store prunes headers within the time-based window
// 3. EVM store prunes ExecMeta up to targetHeight (if executor implements ExecMetaPruner)
// All three use independent progress tracking, so partial failures are isolated.

3. No Upgrade Path Documentation

What happens when upgrading from a non-pruning node to a pruning-enabled node?

Recommendation: Document:

  • Backward compatibility (pruning can be enabled/disabled anytime)
  • Forward compatibility (pruned nodes can't serve full history to peers)

Critical Issues Summary

Priority Issue Impact Recommendation
🔴 High Synchronous pruning in critical path Block production delays Move to background or rate-limit
🟡 Medium Silent pruning failures Unbounded disk growth Add metrics and health checks
🟡 Medium Missing integration tests Unknown real-world behavior Add E2E pruning tests
🟡 Medium No context cancellation in loops Slow shutdown during pruning Add ctx.Done() checks
🟢 Low go-header time-based window drift Potential misalignment Document assumption

Recommendations for Follow-up PRs

  1. Async pruning (High priority): Refactor to run pruning in a background worker
  2. Monitoring (High priority): Add Prometheus metrics for pruning operations
  3. Integration tests (Medium priority): Verify end-to-end pruning behavior
  4. Operational docs (Medium priority): Add pruning guide for operators

Final Verdict

This is a well-engineered feature with solid fundamentals:

  • ✅ Clean architecture with optional interfaces
  • ✅ Idempotent and resumable design
  • ✅ Comprehensive unit test coverage
  • ✅ Proper error handling at the code level

However, the synchronous execution in the critical path is a significant concern that should be addressed before merging to production or documented as a known limitation with plans to refactor.

Approval Status: Approve with recommendations

  • Safe to merge if synchronous pruning latency is acceptable for expected workloads
  • Strongly recommend prioritizing async pruning in a follow-up PR
  • Add integration tests and monitoring before deploying to production

Copy link
Member

@julienrbrt julienrbrt Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we have a lot of different stores and keys. Are we planning to prune as well:

(and eventually ev-abci store)

It will be inconsistent if we don't allow height pruning of all those stores (at once).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this PR to prune metadats, go-header store and evm store ✅
I'm happy to work on prunning ev-abci next, but I would probably do it in another PR to prevent this PR from getting too big if that's ok.

@pthmas pthmas changed the title Block Prunning feat: block Prunning Jan 26, 2026
@pthmas pthmas changed the title feat: block Prunning feat: block Pruning Jan 26, 2026
@pthmas pthmas force-pushed the pierrick/prunning branch from 3e1e8e0 to ed66fe6 Compare January 26, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants